17 research outputs found

    Optimal Fuzzy Model Construction with Statistical Information using Genetic Algorithm

    Full text link
    Fuzzy rule based models have a capability to approximate any continuous function to any degree of accuracy on a compact domain. The majority of FLC design process relies on heuristic knowledge of experience operators. In order to make the design process automatic we present a genetic approach to learn fuzzy rules as well as membership function parameters. Moreover, several statistical information criteria such as the Akaike information criterion (AIC), the Bhansali-Downham information criterion (BDIC), and the Schwarz-Rissanen information criterion (SRIC) are used to construct optimal fuzzy models by reducing fuzzy rules. A genetic scheme is used to design Takagi-Sugeno-Kang (TSK) model for identification of the antecedent rule parameters and the identification of the consequent parameters. Computer simulations are presented confirming the performance of the constructed fuzzy logic controller

    In vitro cultivation and regeneration of Solanum melongena (L.) using stem, root and leaf explants

    Get PDF
    The treatment combinations was BAP (0, 2.0, 3.0 and 4.0 mg/L) and NAA (0, 0.1, 0.5, and 1.0 mg/L). The rate of callus formation varied in different treatments. The highest amount of callus (48.66%) was produced on MS medium containing 2.0 mg/l BAP and 0.5 mg/l NAA from stem and 8.2 days required for callus induction. The number of shoot regenerated through callus from stem containing 2.0 mg/l BAP and 0.5 mg/l NAA was 3.4 (23.287%) and days required for 38.8 days. Key words: Regeneration; BAP; NAA. Nepal Journal of Biotechnology. Jan. 2011, Vol. 1, No. 1 : 49-5

    Exploiting Complex Protein Domain Networks for Protein Function Annotation

    Get PDF
    International audienceHuge numbers of protein sequences are now available in public databases. In order to exploit more fully this valuable biological data, these sequences need to be annotated with functional properties such as Enzyme Commission (EC) numbers and Gene Ontology terms. The UniProt Knowledgebase (UniProtKB) is currently the largest and most comprehensive resource for protein sequence and annotation data. In the March 2018 release of UniProtKB, some 556,000 sequences have been manually curated but over 111 million sequences still lack functional annotations. The ability to annotate automatically these unannotated sequences would represent a major advance for the field of bioinformatics. Here, we present a novel network-based approach called GrAPFI for the automatic functional annotation of protein sequences. The underlying assumption of GrAPFI is that proteins may be related to each other by the protein domains, families, and super-families that they share. Several protein domain databases exist such as In-terPro, Pfam, SMART, CDD, Gene3D, and Prosite, for example. Our approach uses Interpro domains, because the InterPro database contains information from several other major protein family and domain databases. Our results show that GrAPFI achieves better EC number annotation performance than several other previously described approaches

    GrAPFI: predicting enzymatic function of proteins from domain similarity graphs

    Get PDF
    This work is dedicated to the memory of David W. Ritchie, who recently passed away.International audienceBackground: Thanks to recent developments in genomic sequencing technologies, the number of protein sequences in public databases is growing enormously. To enrich and exploit this immensely valuable data, it is essential to annotate these sequences with functional properties such as Enzyme Commission (EC) numbers, for example. The January 2019 release of the Uniprot Knowledge base (UniprotKB) contains around 140 million protein sequences. However, only about half of a million of these (UniprotKB/SwissProt) have been reviewed and functionally annotated by expert curators using data extracted from the literature and computational analyses. To reduce the gap between the annotated and unannotated protein sequences, it is essential to develop accurate automatic protein function annotation techniques. Results: In this work, we present GrAPFI (Graph-based Automatic Protein Function Inference) for automatically annotating proteins with EC number functional descriptors from a protein domain similarity graph. We validated the performance of GrAPFI using six reference proteomes in UniprotKB/SwissProt, namely Human, Mouse, Rat, Yeast, E. Coli and Arabidopsis thaliana. We also compared GrAPFI with existing EC prediction approaches such as ECPred, DEEPre, and SVMProt. This shows that GrAPFI achieves better accuracy and comparable or better coverage with respect to these earlier approaches. Conclusions: GrAPFI is a novel protein function annotation tool that performs automatic inference on a network of proteins that are related according to their domain composition. Our evaluation of GrAPFI shows that it gives better performance than other state of the art methods. GrAPFI is available at https://gitlab.inria.fr/bsarker/bmc_grapfi.git as a stand alone tool written in Python

    Functional Annotation of Proteins using Domain Embedding based Sequence Classification

    Get PDF
    International audienceDue to the recent advancement in genomic sequencing technologies, the number of protein sequences in public databases is growing exponentially. The UniProt Knowledgebase (UniProtKB) is currently the largest and most comprehensive resource for protein sequence and annotation data. The May 2019 release of the Uniprot Knowledge base (UniprotKB) contains around 158 million protein sequences. For the complete exploitation of this huge knowledge base, protein sequences need to be annotated with functional properties such as Enzyme Commission (EC) numbers and Gene Ontology terms. However, there is only about half a million sequences (UniprotKB/SwissProt) are reviewed and functionally annotated by expert curators using information extracted from the published literature and computational analyses. The manual annotation by experts are expensive, slow and insufficient to fill the gap between the annotated and unannotated protein sequences. In this paper, we present an automatic functional annotation technique using neural network based based word embedding exploiting domain and family information of proteins. Domains are the most conserved regions in protein sequences and constitute the building blocks of 3D protein structures. To do the experiment, we used fastText a , a library for learning of word embeddings and text classification developed by Facebook's AI Research lab. The experimental results show that domain embeddings perform much better than k-mer based word embeddings. a https://github.com/facebookresearch/fasttex

    Prot-A-GAN: Automatic Protein Function Annotation using GAN-inspired Knowledge Graph Embedding

    Get PDF
    International audienceProteins perform various functions in living organisms. Automatic protein function annotation is defined as finding appropriate association between proteins and functional labels like Gene Ontology (GO) terms. n this paper, we present a preliminary exploration of the potential of generative adversarial networks (GAN) for protein function annotation. The Prot-A-GAN approach uses GAN-like adversarial training for learning embedding of nodes and relation in an heterogeneous knowledge graph. Following the terminologies of GAN, we firstly train a discriminator using domain-adaptive negative sampling to discriminate positive and negative triples, and then, we train a generator to guide a random walk over the knowledge graph that identify paths between proteins and GO annotations. We evaluate the method by performing protein function annotation using GO terms on human disease proteins from UniProtKB/SwissProt. As a proof-of-concept, the conducted experiments show promising outcome and open up new avenue for further exploration for protein function annotation

    Approches à base de graphes pour l’annotation de la fonction des protéines et la découverte des connaissances

    No full text
    Due to the recent advancement in genomic sequencing technologies, the number of protein entries in public databases is growing exponentially. It is important to harness this huge amount of data to describe living things at the molecular level, which is essential for understanding human disease processes and accelerating drug discovery. A prerequisite, however, is that all of these proteins be annotated with functional properties such as Enzyme Commission (EC) numbers and Gene Ontology (GO) terms. Today, only a small fraction of the proteins is functionally annotated and reviewed by expert curators because it is expensive, slow and time-consuming. Developing automatic protein function annotation tools is the way forward to reduce the gap between the annotated and unannotated proteins and to predict reliable annotations for unknown proteins. Many tools of this type already exist, but none of them are fully satisfactory. We observed that only few consider graph-based approaches and the domain composition of proteins. Indeed, domains are conserved regions across protein sequences of the same family. In this thesis, we design and evaluate graph-based approaches to perform automatic protein function annotation and we explore the impact of domain architecture on protein functions. The first part is dedicated to protein function annotation using domain similarity graph and neighborhood-based label propagation technique. We present GrAPFI (Graph-based Automatic Protein Function Inference) for automatically annotating proteins with enzymatic functions (EC numbers) and GO terms from a protein-domain similarity graph. We validate the performance of GrAPFI using six reference proteomes from UniprotKB/SwissProt and compare GrAPFI results with state-of-the-art EC prediction approaches. We find that GrAPFI achieves better accuracy and comparable or better coverage. The second part of the dissertation deals with learning representation for biological entities. At the beginning, we focus on neural network-based word embedding technique. We formulate the annotation task as a text classification task. We build a corpus of proteins as sentences composed of respective domains and learn fixed dimensional vector representation for proteins. Then, we focus on learning representation from heterogeneous biological network. We build knowledge graph integrating different sources of information related to proteins and their functions. We formulate the problem of function annotation as a link prediction task between proteins and GO terms. We propose Prot-A-GAN, a machine-learning model inspired by Generative Adversarial Network (GAN) to learn vector representation of biological entities from protein knowledge graph. We observe that Prot-A-GAN works with promising results to associate ap- propriate functions with query proteins. In conclusion, this thesis revisits the crucial problem of large-scale automatic protein function annotation in the light of innovative techniques of artificial intelligence. It opens up wide perspectives, in particular for the use of knowledge graphs, which are today available in many fields other than protein annotation thanks to the progress of data science.Les progrès des technologies de séquençage génomique ont conduit à une croissance exponentielle du nombre de séquences protéiques dans les bases de données publiques. Il est important d’exploiter cette énorme quantité de données pour décrire les êtres vivants au niveau moléculaire, et ainsi mieux comprendre les processus pathologiques humains et accélérer la découverte de médicaments. Une condition préalable, cependant, est que toutes ces protéines soient annotées avec des propriétés fonctionnelles telles que les numéros de commission enzymatique (EC) ou les termes de l’ontologie « Gene Ontology » (GO). Aujourd’hui, seule une petite fraction des protéines est annotée fonctionnellement et examinée manuellement par des experts car c’est une tâche coûteuse, lente et chronophage. Le développement d’outils d’annotation automatique des protéines est la voie à suivre pour réduire l’écart entre séquences protéiques annotées et non annotées et produire des annotations fiables. Aucun outil déjà développés n’est pleinement satisfaisant. Seuls quelques-uns utilisent les approches à base de graphes et tiennent compte de la composition en domaines des protéines qui sont des régions conservées à travers les séquences protéiques de la même famille. Dans cette thèse, nous concevons et évaluons des approches à base de graphes pour effectuer l’annotation automatique des fonctions protéiques et nous explorons l’impact de l’architecture en domaines sur les fonctions protéiques. La première partie est consacrée à l’annotation de la fonction des protéines à l’aide d’un graphe de similarité de domaines et de techniques de propagation d’étiquettes (ou de labels) améliorées. Tout d’abord, nous présentons GrAPFI (« Graph-based Automatic Protein Function Inference ») pour l’annotation automatique des protéines par les numéros EC et par des termes GO. Nous validons les performances de GrAPFI en utilisant six protéomes de référence dans UniprotKB/SwissProt, et nous comparons les résultats de GrAPFI avec des outils de référence. Nous avons constaté que GrAPFI atteint une meilleure précision et une couverture comparable ou meilleure par rapport aux outils existants. La deuxième partie traite de l’apprentissage de représentations pour les entités biologiques. Au début, nous nous concentrons sur les techniques de plongement lexical (« word embedding »), utilisant les réseaux neuronaux. Nous formulons la tâche d’annotation comme une tâche de classification de textes. Nous construisons un corpus de protéines sous forme de phrases composées de leurs domaines respectifs et nous apprenons une représentation vectorielle à dimension fixe. Ensuite, nous portons notre attention sur l’apprentissage de représentations à partir de graphes de connaissances intégrant différentes sources de données liées aux protéines et à leurs fonctions. Nous formulons le problème d’annotation fonctionnelle des protéines comme une tâche de prédiction de liens entre une protéine et un terme GO. Nous proposons Prot-A-GAN, un modèle d’apprentissage automatique inspiré des réseaux antagonistes génératifs (GAN pour « Generative Adversarial Network »). Nous observons que Prot-A-GAN fonctionne avec des résultats prometteurs pour associer des fonctions appropriées aux protéines requêtes. En conclusion, cette thèse revisite le problème crucial de l’annotation automatique des fonctions protéiques à grande échelle en utilisant des techniques innovantes d’intelligence artificielle. Elle ouvre de larges perspectives, notamment pour l’utilisation des graphes de connaissances, disponibles aujourd’hui dans de nombreux domaines autres que l’annotation de protéines grâce aux progrès de la science des données

    Approches à base de graphes pour l’annotation de la fonction des protéines et la découverte des connaissances

    No full text
    Due to the recent advancement in genomic sequencing technologies, the number of protein entries in public databases is growing exponentially. It is important to harness this huge amount of data to describe living things at the molecular level, which is essential for understanding human disease processes and accelerating drug discovery. A prerequisite, however, is that all of these proteins be annotated with functional properties such as Enzyme Commission (EC) numbers and Gene Ontology (GO) terms. Today, only a small fraction of the proteins is functionally annotated and reviewed by expert curators because it is expensive, slow and time-consuming. Developing automatic protein function annotation tools is the way forward to reduce the gap between the annotated and unannotated proteins and to predict reliable annotations for unknown proteins. Many tools of this type already exist, but none of them are fully satisfactory. We observed that only few consider graph-based approaches and the domain composition of proteins. Indeed, domains are conserved regions across protein sequences of the same family. In this thesis, we design and evaluate graph-based approaches to perform automatic protein function annotation and we explore the impact of domain architecture on protein functions. The first part is dedicated to protein function annotation using domain similarity graph and neighborhood-based label propagation technique. We present GrAPFI (Graph-based Automatic Protein Function Inference) for automatically annotating proteins with enzymatic functions (EC numbers) and GO terms from a protein-domain similarity graph. We validate the performance of GrAPFI using six reference proteomes from UniprotKB/SwissProt and compare GrAPFI results with state-of-the-art EC prediction approaches. We find that GrAPFI achieves better accuracy and comparable or better coverage. The second part of the dissertation deals with learning representation for biological entities. At the beginning, we focus on neural network-based word embedding technique. We formulate the annotation task as a text classification task. We build a corpus of proteins as sentences composed of respective domains and learn fixed dimensional vector representation for proteins. Then, we focus on learning representation from heterogeneous biological network. We build knowledge graph integrating different sources of information related to proteins and their functions. We formulate the problem of function annotation as a link prediction task between proteins and GO terms. We propose Prot-A-GAN, a machine-learning model inspired by Generative Adversarial Network (GAN) to learn vector representation of biological entities from protein knowledge graph. We observe that Prot-A-GAN works with promising results to associate ap- propriate functions with query proteins. In conclusion, this thesis revisits the crucial problem of large-scale automatic protein function annotation in the light of innovative techniques of artificial intelligence. It opens up wide perspectives, in particular for the use of knowledge graphs, which are today available in many fields other than protein annotation thanks to the progress of data science.Les progrès des technologies de séquençage génomique ont conduit à une croissance exponentielle du nombre de séquences protéiques dans les bases de données publiques. Il est important d’exploiter cette énorme quantité de données pour décrire les êtres vivants au niveau moléculaire, et ainsi mieux comprendre les processus pathologiques humains et accélérer la découverte de médicaments. Une condition préalable, cependant, est que toutes ces protéines soient annotées avec des propriétés fonctionnelles telles que les numéros de commission enzymatique (EC) ou les termes de l’ontologie « Gene Ontology » (GO). Aujourd’hui, seule une petite fraction des protéines est annotée fonctionnellement et examinée manuellement par des experts car c’est une tâche coûteuse, lente et chronophage. Le développement d’outils d’annotation automatique des protéines est la voie à suivre pour réduire l’écart entre séquences protéiques annotées et non annotées et produire des annotations fiables. Aucun outil déjà développés n’est pleinement satisfaisant. Seuls quelques-uns utilisent les approches à base de graphes et tiennent compte de la composition en domaines des protéines qui sont des régions conservées à travers les séquences protéiques de la même famille. Dans cette thèse, nous concevons et évaluons des approches à base de graphes pour effectuer l’annotation automatique des fonctions protéiques et nous explorons l’impact de l’architecture en domaines sur les fonctions protéiques. La première partie est consacrée à l’annotation de la fonction des protéines à l’aide d’un graphe de similarité de domaines et de techniques de propagation d’étiquettes (ou de labels) améliorées. Tout d’abord, nous présentons GrAPFI (« Graph-based Automatic Protein Function Inference ») pour l’annotation automatique des protéines par les numéros EC et par des termes GO. Nous validons les performances de GrAPFI en utilisant six protéomes de référence dans UniprotKB/SwissProt, et nous comparons les résultats de GrAPFI avec des outils de référence. Nous avons constaté que GrAPFI atteint une meilleure précision et une couverture comparable ou meilleure par rapport aux outils existants. La deuxième partie traite de l’apprentissage de représentations pour les entités biologiques. Au début, nous nous concentrons sur les techniques de plongement lexical (« word embedding »), utilisant les réseaux neuronaux. Nous formulons la tâche d’annotation comme une tâche de classification de textes. Nous construisons un corpus de protéines sous forme de phrases composées de leurs domaines respectifs et nous apprenons une représentation vectorielle à dimension fixe. Ensuite, nous portons notre attention sur l’apprentissage de représentations à partir de graphes de connaissances intégrant différentes sources de données liées aux protéines et à leurs fonctions. Nous formulons le problème d’annotation fonctionnelle des protéines comme une tâche de prédiction de liens entre une protéine et un terme GO. Nous proposons Prot-A-GAN, un modèle d’apprentissage automatique inspiré des réseaux antagonistes génératifs (GAN pour « Generative Adversarial Network »). Nous observons que Prot-A-GAN fonctionne avec des résultats prometteurs pour associer des fonctions appropriées aux protéines requêtes. En conclusion, cette thèse revisite le problème crucial de l’annotation automatique des fonctions protéiques à grande échelle en utilisant des techniques innovantes d’intelligence artificielle. Elle ouvre de larges perspectives, notamment pour l’utilisation des graphes de connaissances, disponibles aujourd’hui dans de nombreux domaines autres que l’annotation de protéines grâce aux progrès de la science des données

    Exploiting Complex Protein Domain Networks for Protein Function Annotation

    No full text
    International audienceHuge numbers of protein sequences are now available in public databases. In order to exploit more fully this valuable biological data, these sequences need to be annotated with functional properties such as Enzyme Commission (EC) numbers and Gene Ontology terms. The UniProt Knowledgebase (UniProtKB) is currently the largest and most comprehensive resource for protein sequence and annotation data. In the March 2018 release of UniProtKB, some 556,000 sequences have been manually curated but over 111 million sequences still lack functional annotations. The ability to annotate automatically these unannotated sequences would represent a major advance for the field of bioinformatics. Here, we present a novel network-based approach called GrAPFI for the automatic functional annotation of protein sequences. The underlying assumption of GrAPFI is that proteins may be related to each other by the protein domains, families, and super-families that they share. Several protein domain databases exist such as In-terPro, Pfam, SMART, CDD, Gene3D, and Prosite, for example. Our approach uses Interpro domains, because the InterPro database contains information from several other major protein family and domain databases. Our results show that GrAPFI achieves better EC number annotation performance than several other previously described approaches
    corecore